Paraphrase Identification using Machine Learning Techniques

نویسندگان

  • A. CHITRA
  • C. S. SARAVANA KUMAR
چکیده

Paraphrases are different ways of expressing the same content. Two sentences are said to be paraphrases if they are semantically equivalent. Identification of paraphrases has numerous applications such as Information Extraction, Question Answering, etc. The traditional systems use threshold values to decide whether two sentences are paraphrases. This threshold determination process is independent on the training data and apart may lead to incorrect paraphrase reasoning. In order to avoid the threshold settings, we propose to use machine learning techniques. The advantages of a ML approach is its ability to account for a large mass of information and the possibility to incorporate different information sources like morphologic, syntactic, and semantic among others in a single execution. With the objective to increase the performance of the system and to develop a machine learning approach for paraphrase identification, we scrutinize the influence of the combination of lexical and semantic information, as well as techniques for classifier combination Key-Words: Paraphrase, SVM, Natural Language Processing, n-grams, skip grams, cardinal number

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paraphrase Identification on the Basis of Supervised Machine Learning Techniques

This paper presents a machine learning approach for paraphrase identification which uses lexical and semantic similarity information. In the experimental studies, we examine the limitations of the designed attributes and the behavior of three machine learning classifiers. With the objective to increase the final performance of the system, we scrutinize the influence of the combination of lexica...

متن کامل

ASE@DPIL-FIRE2016: Hindi Paraphrase Detection using Natural Language Processing Techniques & Semantic Similarity Computations

The paper reports the approaches utilized and results achieved for our system in the shared task (in FIRE-2016) for paraphrase identification in Indian languages (DPIL). Since Indian languages have a complex inherent nature, paraphrase identification in these languages becomes a challenging task. In the DPIL task, the challenge is to detect and identify whether a given sentence pairs paraphrase...

متن کامل

AMRITA_CEN$@$SemEval-2015: Paraphrase Detection for Twitter using Unsupervised Feature Learning with Recursive Autoencoders

We explore using recursive autoencoders for SemEval 2015 Task 1: Paraphrase and Semantic Similarity in Twitter. Our paraphrase detection system makes use of phrase-structure parse tree embeddings that are then provided as input to a conventional supervised classification model. We achieve an F1 score of 0.45 on paraphrase identification and a Pearson correlation of 0.303 on computing semantic s...

متن کامل

Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification

Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...

متن کامل

KEC@DPIL-FIRE2016: Detection of Paraphrases in Indian Languages (Tamil)

This paper presents a report on Detecting Paraphrases in Indian Languages (DPIL), in particular the Tamil language, by the team NLP@KEC of Kongu Engineering College. Automatic paraphrase detection is an intellectual task which has immense applications like plagiarism detection, new event detection, etc. Paraphrase is defined as the expression of a given fact in more than one way by means of dif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010